Comparative mapping
نویسندگان
چکیده
Comparative genetic mapping studies reveal similarities and differences in gene content and gene order between genera belonging to different taxa. A pressing need in legume genomics is to integrate knowledge gained from the study of model legume genomes with the biological and agronomic questions of importance in the crop species. It is important to know whether similar features are prevalent in other plant families, in particular because the extent of such differences may define the limits of comparative structural genomics as a strategy for applied agriculture. We provide in the following section protocols for marker development in genes, genotyping and mapping in M. tr. and other legume species. 1. Selection of parental plants and development of mapping populations As a first step for the establishment of a linkage map the parental individuals of a mapping population have to be selected. During selection of the parental plants one must consider that plants which are more distantly related are more likely to result in a highly polymorphic mapping population ( e. g. interspecific mapping population), facilitating the later positioning of the markers on the genetic map (in contradiction with the intraspecific mapping populations). It is also convenient if the candidate parents present morphological differences (e.g. different flower color, difference in height), which can provide easy means of ensuring that the F1 progenies resulted from cross pollination. The selection of diploid parental plants will foster the genetic studies performed later on the mapping population (e.g. diploid Medicago truncatula, diploid Medicago sativa ssp. quasifalcata and ssp. cerulea). After selecting the parental plants, different kinds of mapping populations can be generated depending on the number of generations created after F1. They can be F2 mapping populations, RIL (Recombinant Inbreed Line) populations, back-cross populations and NIL (Nearly Isogenic Line) populations each having particularities which are useful in different situations for different purposes. 2. Genomic DNA isolation: 2.1. Homogenizing plant tissue Harvest different organs from the plant: roots, leaves, floral organs (to obtain large amount of DNA it is preferable to use young leaves). Harvest the organs, put them in labeled paper bags and place on regular ice during harvest to make sure that they are kept cool. For immediate grinding: place leaf sample in a mortar or some other type of container able to hold liquid nitrogen. Quick-freeze samples in liquid nitrogen. Once frozen do not allow samples to thaw until isolation! Grind the tissue samples in the mortar and pestle in liquid nitrogen. Make sure to pre-chill the mortar and pestle. Grind the samples as finely as possible. Trying to grind large amounts of tissue will result in coarse grinding and will greatly reduce the yields. If the harvested samples are not processed immediately, they can be stored at -80oC in air tight bags for a few days. For later DNA isolation, the harvested plant organs can be lyophilized and stored in a labeled paper envelop. 2.2. DNA isolation from grinded tissue Various ready-to-use isolation kits can be purchased and the provided protocols followed to obtain pure DNA. If no such kits are used, several convenient protocols can be followed to obtain pure DNA. Comparative mapping page 2 of 21 Medicago truncatula handbook version November 2006 Isolation using CTAB extraction buffer: Collect the fresh plant material (20-50 ng) in 1.5 ml eppendorf tubes. Add 100 mg quartz sand. Homogenize the fresh plant tissue with the sand then add 300 μl CEP buffer, than vortex the mixture shortly. Incubate for 60 min, with continuous gentle rocking at 65oC. Do not exceed 75 minutes as DNA yield will be compromised. Remove tubes from incubation and add 600μl chloroform. Rock gently the samples to mix, for 30 min at room temperature. Spin down the samples in a centrifuge for 10 min at 13000 RPM at room temperature Collect the supernatant in 520 μl isopropanol (2-propanol), mix gently and keep at -80oC for 20 minutes. Spin down the samples and remove the isopropanol, repeat the procedure and resolve the pellet in 150 μl 0, 1 mg/ml RNase. Incubate the samples for 2 hours at 37oC. NH4Ac/SDS cleaning: add 150 μl TES to the 150 μl DNA, vortex it and incubate 10 minutes at room temperature; add 150 μl NH4Ac [7,5M] vortex it and incubate for 10 minutes at 20o then centrifuge for 5 minutes at 13000 RMP. Collect the supernatant in 750 μl ethanol, vortex it and incubate at -20oC for 20 minutes. Centrifuge the samples for 5 minutes with 13000 RMP, remove the supernatant, repeat the centrifugation and remove carefully all the supernatant then dry the DNA pellet. Resolve the pellet in 1x TE, double as many mg plant tissue was initially used to isolate DNA. CEP: TES: 1x TE (10:1) 2% CTAB 10 mM Tris (pH 7.5) Tris HCl 100 mM (pH 7.6) 100 mM Tris 1 mM EDTA EDTA 10 mM 20 mM EDTA 1% SDS 1, 4 M NaCl 0.5% ß-mercaptoethanol 3. Quantification of the extracted DNA 3.1. Quantification using the Spectrophotometer Calibration of the machine as described in the user manual Use water or TE as a reference (or blank). Aliquot 1ml of either water or TE into the cuvette and load the spectrophotometer. Use diluted sample for measuring as a 1:100 The readings should be don at A260 and A280. DNA concentration (μg/ml) = 50 x A260 x Dilution Factor RNA concentration (μg/ml) = 40 x A260 x Dilution Factor 260/280 ratio = 1.6 ~1.8 – absorption due to DNA = 1.6 or less – Protein contamination = 2.0 or more – Chloroform or phenol contamination. 3.2. DNA concentration can be approximated using mass rulers with known concentration of each migration bands. DNA samples are run next to mass rulers, on agarose gel and the concentration can be estimate according to the ladder description Comparative mapping page 3 of 21 Medicago truncatula handbook version November 2006 4. Gene specific primer design for PCR amplification 4.1 The intron targeting method An efficient strategy to generate gene-specific markers for mapping in plants is the Intron Targeting (IT) method. IT primer pairs are complementary to the sequences of the exons flanking the targeted intron. Since the targeted intron sequence is generally less conserved than the exons, the amplified product may display polymorphism due to length/nucleotide variation among introns in the alleles of the gene. On the other hand, the higher level of sequence conservation in the exons ensures that all alleles can be effectively amplified. If a single targeted intron is too short, primers may be designed to match exons flanking two introns and an internal exon, thereby fostering the detection of length polymorphism. The prerequisite of the method is that the genomic region harboring the gene is sequenced and mRNA, assembled EST consensus or at least EST sequences also exist. EST consensi can be obtained from TIGR: the Gene Index databases contain such Tentative Consensus (TC) sequences (ftp://ftp.tigr.org/pub/data/tgi/). Throughout this chapter, mRNA, TC or EST sequences will be referred to as cDNA sequences. Here we describe a computational method to design primer sequences that can be used to generate IT markers for mapping studies. All software programs are freely available and can easily be built into a processing pipeline. We also present a web application built to carry out the last few steps of the process, i.e. the exon selection and the primer design steps. 4.1.1. Design of intron targeting primers The method used to find the intron to be targeted and its flanking exons depends on whether cDNA and genomic sequences are both available from the same species (in this case in M. truncatula) or not. If the genomic region is not sequenced yet, the sequence of a homologous (preferably orthologous) gene from another species can be of help. cDNA and genomic sequences exist from the same species The method described here is based on the cDNA sequence of the targeted gene and all genomic sequences of the same species. We use the blastn search mode (Altschul et al. 1997) of the blastall program from NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/) to find the genomic region coding for the gene. The E-value threshold is set to 10. We extract the sequence of the genomic region. To locate the precise boundaries of the exons (i.e. the positions of the introns) within the cDNA sequence , we use the sim4 program (Florea et al. 1998) (http://globin.cse.psu.edu/html/docs/sim4.html). Sim4 rapidly aligns a spliced transcript sequence to its parent genomic sequence and attempts to find to correct exon–intron junction. The program generates a list of exons and their positions on the sequences. Fig. 1 shows an example sim4 result containing the list and an optional alignment. If the gene contains introns, the intron positions within the cDNA sequence can be determined. Although sim4 may miss small marginal exons, usually at least one intron suitable for targeting can be identified for a gene. Comparative mapping page 4 of 21 Medicago truncatula handbook version November 2006 Figure 1: List of exons and optional alignment in the output of sim4 A. exon positions, A1. exon positions on the cDNA, A2. exon positions on the genomic sequence, A3. sequence similarity, B. spliced alignment (partially shown), B1. place of intron. Having located an intron, the joined sequences of the flanking exons can be passed to a primer designer program. A target position corresponding to the position of the intron between the two exons is specified and the designer program is instructed to find a forward primer for the 5’ exon and a reverse primer for the 3’ exon. The size of the PCR product is predicted by adding up the distance between the primer positions and the length of the intron. If a PCR product with an effectively amplifiable product size can be predicted for two exons that are not adjacent in the cDNA, the above procedure is carried out for such an exon pair with the target being two introns and another exon between them. We prefer a product size within a range of 300 – 2000 bp and use filtering parameters to select the suitable exons accordingly. To generate the primer sequence pairs we use the Primer3 program (Rozen and Skaletsky 2000) (http://frodo.wi.mit.edu/primer3/primer3_code.html) since it can be built into a processing pipeline. The program takes as an input a text file containing the sequence and the parameters (e.g. target position, requested product size). Plant genomes contain many multigene families. Since IT markers are based on primers matching the relatively conserved protein-coding exons of genes, the amplification of other members of a multigene family cannot be excluded. Therefore, it is important to primarily choose singleor low-copy genes as targets. A computational approach aimed to predict copy number of genes can be based on clustering of homologous cDNA sequences. In case of a species for which extensive EST sequencing and generation of tentative EST consensi (TC) has been performed, the number of non-identical sequences in a cDNA/TC cluster provides an Comparative mapping page 5 of 21 Medicago truncatula handbook version November 2006 approximation on the copy number of the gene. When comparing two homologous sequences, two thresholds of sequence matching are used: an identity threshold to distinguish nonidentical sequences from identical ones and a similarity threshold to determine whether two sequences belong to the same cluster or not. To account for EST sequencing errors, the identity threshold must be set 1–2 % below true identity. Similarity threshold is usually set to 80%. An all-against-all similarity search using blastn can provide the pairwise similarity information that can be post-processed using a simple single-linkage clustering algorithm. cDNA and genomic sequences are from different species cDNA from the target species and genomic sequences from a helper species If the genomic sequence of the targeted gene is not available from the species of interest, the homologous genomic region from a related species may offer the possibility to predict the position of the introns within a cDNA sequence and estimate the intron lengths. To obtain reliable predictions, the following criteria must be met: the proteins encoded by the genes should be sufficiently similar, and the exon/intron structure of the orthologous genes should correspond to each other. To satisfy these criteria, the two species have to be evolutionarily close to each other. The cDNA sequence of the gene and species of interest is used as query to find the homologous (orthologous) gene in the genomic sequence of the second species. If we want to design primers for M. truncatula, we can select another legume or Arabidopsis thaliana as the ‘helper’ species. A database of the genomic sequences of the second species can be searched using either blastn (E-value threshold: (E-value threshold: 10) or tblastx (E-value threshold: 10). Although the tblastx results must be evaluated with caution since artifacts cannot be completely eliminated by filtering for E-value, the higher sensitivity of the search carried out at the protein level may justify its use. The genomic region coding for the helper gene must be extracted and aligned to the query cDNA. The sim4 program is designed to compare nearly identical sequences, differing only in the presence or absence of introns, therefore its use for inter-species comparison is very limited. Taking advantage of the higher similarity observed at the protein level, the alignment of genomic DNA to a protein sequence that may even be heterologous (i.e. from a different species) can be carried out using the genewise program of the Wise2 package (Birney et al. 2004) (http://www.ebi.ac.uk/Wise2/). Both sim4 and genewise take into account the canonical exon/intron junction sequence (GT...AG) to predict the correct boundaries of the exons while genewise also determines and checks the intron phase at the putative junctions. Before applying genewise, the cDNA of the gene has to be translated into the corresponding protein sequence. One can use the transeq program of the EMBOSS package (Rice et al. 2000) to obtain the protein sequence. If the gene contains introns, genewise provides the exon positions with respect to the protein sequence, therefore the numbers need to be converted into positions on the cDNA sequence. An example genewise output is shown in Figure 2. Comparative mapping page 6 of 21 Medicago truncatula handbook version November 2006 Figure 2: List of exons and alignment in the output of genewise A. exon positions, A1. exon positions on the cDNA, A2. exon positions on the genomic sequence, B. spliced alignment (partially shown), B1. place of intron. Having determined the positions of the exons flanking one or two introns, the design of the primers based on the cDNA sequence is carried out as described above. However, the length of the targeted intron is not known, consequently the size of the PCR product can only be estimated based on the length of the orthologous intron. Genomic sequence from the target species and cDNA from a helper species If a cDNA database exists for the helper species, either blastn or tblastx can be used as above to find the orthologous/homologous pair of cDNA and genomic sequences. The genomic region to be targeted and its cDNA orthologue must then be aligned as above. The primer pair is then designed for an exon pair in the genomic sequence from the species of interest. Web applications to help primer design We developed two web applications to facilitate the last two steps of the primer design process described above. One uses the sim4 program to align cDNA sequence of a gene to its genomic counterpart while the other runs the genewise program to align the protein encoded by the cDNA to a homologous genomic sequence from another species. In both cases the corresponding pair of cDNA and genomic sequences must be identified beforehand. The two sequences can either be pasted into the input form or uploaded from sequence files (in fasta format). The exon positions are first determined by the software then the user can select the two exons that will be used by the Primer3 program to design the forward and reverse primers, respectively. The online applications are available at http://bioinformatics.abc.hu/itprim/. Comparative mapping page 7 of 21 Medicago truncatula handbook version November 2006 4.1.2. PCR amplification protocols Specific amplification using Pfu enzyme: PCR amplification reaction mix consist of 10-10 pmol of forward and reverse primers, 1 U of Pfu enzyme, 2mM MgSO4, 10 mM (NH4)2SO4, 20 mM Tris-HCl (pH8.8), 10mM KCl, 0.1% Triton X-100, 0.1 mg/ml BSA, 0.75 mM activated calf thymus DNA, 200 mM of each dNTP, and 25 ng total DNA of the individuals *Pfu enzyme incorporates nucleotides at 70-80oC and is more thermostable than Taq polymerase **For Pfu enzyme MgSO4 is needed in PCR instead of MgCl2 ***Pfu is eight times more accurate than Taq polymerase Specific amplification using Taq enzyme PCR amplification reaction mix consists of 10-10 pmol of forward and reveres primers, 1 U Taq polymerase enzyme, 1.5 mM MgCl2, 200 mM of each dNTP, and 25 ng total DNA of the individuals in 1x Taq polymerase buffer in a final volume of 25 μl. RAPD amplification The reaction mix consists of 10 pmol 10-mer random primer, 1 U Taq polymerase enzyme, 2.4 mM MgCl2, 200 mM of each dNTP, and 25 ng total DNA of the individuals in 1x Taq polymerase buffer in a final volume of 25 μl. * To improve the success of amplification in some cases MgCl2 or MgSO4 gradient is advised between 1 mM and 2.5 mM concentration 4.1.3. PCR programs Gradient PCR programs In cases when the optimal working condition of primers is unknown, gradient amplification conditions are advised: the reactions can be carried out in 35 cycles of 30 s at 94°C; 1 min at different annealing temperature decreasing by each column with a predetermined number of C grades (e.g. 60-58-56-54-52-50-48-46), 1 or 2 min at 72°C, the reactions can be terminated with a final extension at 72°C for 4 min. Touch down amplification programs can be used in case of RAPD amplification or when annealing temperature of a primer is unknown. In 2 cycles: 30 s at 94°C, 1 minute at 60°C and 1 minute at 72°C, repeat this and decrease the annealing temperature by 2°C after each second cycle. Reach the lowest annealing temperature that can be considered useful and apply at least 30 cycles of amplification, than terminated with a final extension at 72°C for 4 min. RAPD amplification is in fact a touch down amplification program reaching 37°C as the lowest annealing temperature. 4.2. The candidate gene direct sequencing and SNP discovery method This method focuses on the mapping of candidate genes in M.tr. and other legume species. Putative orthologous sequences are searched for and sequenced in the parents of mapping populations in order to develop SNP markers. 4.2.1. If candidate gene sequences are available in the other legume species Sequences for the genes of interest should be retrieved from GenBank and EMBL databases and primers designed to amplify 0.3-3.0 kb-sized fragments, depending on the length and type of sequence available -genomic DNA or cDNA. PCR reactions are carried out in a total volume of 25 μl containing 20 ng of template genomic DNA, 0.2 mM of each primer, 0.2 mM dNTP, 1.5 mM MgCl2, 1X Taq buffer, and 1.5 units Taq polymerase. After an initial 3 min Comparative mapping page 8 of 21 Medicago truncatula handbook version November 2006 denaturation step at 94°C, 35 cycles each of 50 s denaturation at 92°C, and 50 s at the required Tm (locus-dependent) and 3 min elongation at 72°C, are performed. These cycles are followed by a final 5 min elongation step at 72°C. PCR products are purified from 1-2% agarose gels using the NucleoSpin gel-extraction kit (Macherey-Nagel, Düren, Germany) and sequenced directly. Sequences should be aligned and insertions/deletions and/or SNPs looked for among the parents using ClustalW (http://www.infobiogen.fr/services/analyseq/cgibin/clustalw_in.pl). Putative orthologous genes should be searched in Medicago truncatula. Several strategies are possible. In some cases, the same primer pairs used in other legume species can amplify the genomic DNA of M.tr. parental lines. PCR conditions used for other legume species should be tried initially, and PCR conditions optimized in order to obtain a single band in the electrophoretic profile. When there is no amplification, orthologous sequence should be searched in the M.tr. EST databases (http://medicago.toulouse.inra.fr/Mt/EST/ or http://www.tigr.org/tigr-scripts/tgi/T_index.cgi? species=medicago), and specific primers should be designed for M.tr. The amplification products are sequenced directly and screened for polymorphism between Mtr parental lines. In the remaining cases putative orthologous genes can be searched in M.tr. BAC sequences and their linkage group assignment and position when available, on http://www.medicago.org/genome/. 4.2.2. Starting from M.tr. mapped gene markers to design putative orthologous gene markers in other legume species EST-derived microsatellite markers have been designed and mapped in M.tr. (Jemalong x DZA315.16 genetic map (T. Huguet, http://medicago.toulouse.inra.fr/Mt/GeneticMAP/LR4_MAP.html). Physical map and sequences can also be searched on http://www.medicago.org/genome/. Then, EMBL database should be searched for homologous sequences in other legumes. Where good homology is found, primer pairs can be designed in order to amplify, sequence, and detect polymorphism between the other legume species parental genotypes. 5. Polymorphism detection: When no sequence information is available for the parents of the mapping populations (the IT method), different polymorphism detection techniques will be used successively (7.1., then 7.2. and/or 7.3). When sequence information is available (candidate gene direct sequencing and SNP discovery method), the strategy used for polymorphism detection can be adapted to the type of polymorphism revealed by sequence analysis. 5.1. Length and single dose polymorphism detection by agarose gel electrophoresis After PCR amplification loading buffer is added to the samples (5 μl to the 25μl volume) and are run on different concentration of agarose gel. Loading buffer contains 50 mM Tris-HCl pH 8.0, 40% sucrose, 10 mM EDTA pH8.0, 0.05% bromophenol-blue. The PCR fragments can be separated in different concentration of agarose gel depending on their length (Table 1.). Ethidium bromide is added to the agarose gel (50mg EB/100ml agarose gel) or in the migration buffer in order to visualize the double stranded linear DNA. Safety: Ethidium bromide is mutagenic, wear gloves when handling stock and any solution or gel that contains ethidium bromide Comparative mapping page 9 of 21 Medicago truncatula handbook version November 2006 Table 1. The range of separation of the linear double stranded DNA on agarose and polyacrylamide gel. Agarose gel (%) Range of separation (bp) Polyacrylamide gel (%) Range of separation (bp) 0.5 1000-3000 3.5 100-1000 0.7 800-12000 5.0 80-500 1.0 500-10000 8.0 60-400 1.2 400-7000 12.0 40-200 1.5 200-4000 20.0 5-100
منابع مشابه
A comparative study of quantitative mapping methods for bias correction of ERA5 reanalysis precipitation data
This study evaluates the ability of different quantitative mapping (QM) methods as a bias correction technique for ERA5 reanalysis precipitation data. Climate type and geographical location can affect the performance of the bias correction method due to differences in precipitation characteristics. For this purpose, ERA5 reanalysis precipitation data for the years 1989-2019 for 10 selected syno...
متن کاملUsing Concept Mapping and Mind Mapping in Descriptive and Narrative Writing Classes
This study was an attempt to investigate the comparative impact of concept map and mind map instruc- tion on EFL learners’ descriptive and narrative writing. To fulfill this purpose, 60 intermediate EFL learners were selected from among a total number of 100 through their performance on a pretest, i.e., a piloted sample Cambridge Preliminary English Test (PET). T...
متن کاملA GIS-based comparative study of the analytic hierarchy process, bivariate statistics and frequency ratio methods for landslide susceptibility mapping in part of the Tehran metropolis, Iran
The high hillsides of the Tehran metropolis are prone to landslides due to the climatic conditions and the geological, geomorphologicalcharacteristics of the region. Therefore, it is vitally important that a landslide susceptibility map of the region be prepared. For thispurpose, thematic layers including landslide inventory, lithology, slope, aspect, curvature, distance to stream, distance to ...
متن کاملCurrent Status of Comparative Mapping in Livestock
Comparative maps, representing chromosomal locations of homologous genes in different species, are useful sources of information for identifying candidate disease genes and genes determining complex traits. They facilitate gene mapping and linkage prediction in other species, and provide information on genome organization and evolution. Here, the current gene mapping and comparative mapping sta...
متن کاملComparative mapping of the barley Ppd-H1 photoperiod response gene region, which lies close to a junction between two rice linkage segments.
Comparative mapping of cereals has shown that chromosomes of barley, wheat, and maize can be described in terms of rice "linkage segments." However, little is known about marker order in the junctions between linkage blocks or whether this will impair comparative analysis of major genes that lie in such regions. We used genetic and physical mapping to investigate the relationship between the di...
متن کاملComparative mapping of human chromosome 13 genes in the pig shows a similar gene arrangement
Previous comparative mapping between the human and pig genomes suggested complete conservation of human chromosome 13 (HSA13) to pig chromosome 11 (SSC11). The objectives of this study were comparative gene mapping of pig homologs of HSA13 genes and an examination of gene order within this conserved synteny group by physical assignment of each locus. A detailed HSA13 to SSC11 comparison was cho...
متن کامل